Search CORE

13 research outputs found

Memory-Efficient Recursive Evaluation of 3-Center Gaussian Integrals

Author: Asadchev Andrey
Valeev Edward F.
Publication venue
Publication date: 06/10/2022
Field of study

To improve the efficiency of Gaussian integral evaluation on modern accelerated architectures FLOP-efficient Obara-Saika-based recursive evaluation schemes are optimized for the memory footprint. For the 3-center 2-particle integrals that are key for the evaluation of Coulomb and other 2-particle interactions in the density-fitting approximation the use of multi-quantal recurrences (in which multiple quanta are created or transferred at once) is shown to produce significant memory savings. Other innovation include leveraging register memory for reduced memory footprint and direct compile-time generation of optimized kernels (instead of custom code generation) with compile-time features of modern C++/CUDA. High efficiency of the CPU- and CUDA-based implementation of the proposed schemes is demonstrated for both the individual batches of integrals involving up to Gaussians with low and high angular momenta (up to

L=6

) and contraction degrees, as well as for the density-fitting-based evaluation of the Coulomb potential. The computer implementation is available in the open-source LibintX library.Comment: 37 pages, 2 figures, 6 table

arXiv.org e-Print Archive

Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units

Author: Allada Veerendra
Asadchev Andrey
Bode Brett M.
Felder Jacob
Gordon Mark S.
Windus Theresa Lynn
Publication venue: Iowa State University Digital Repository
Publication date: 01/02/2010
Field of study

An implementation is presented of an uncontracted Rys quadrature algorithm for electron repulsion integrals, including up to g functions on graphical processing units (GPUs). The general GPU programming model, the challenges associated with implementing the Rys quadrature on these highly parallel emerging architectures, and a new approach to implementing the quadrature are outlined. The performance of the implementation is evaluated for single and double precision on two different types of GPU devices. The performance obtained is on par with the matrix−vector routine from the CUDA basic linear algebra subroutines (CUBLAS) library

Digital Repository @ Iowa State University (ISU)

New Multithreaded Hybrid CPU/GPU Approach to Hartree−Fock

Author: Andrey Asadchev
Asadchev A.
Buttari A.
Davidson E. R.
Furlani T. R.
Gordon M. S.
Ishimura K.
Janssen C. L.
Mark S. Gordon
Rys J.
Turney J. M.
Ufimtsev I. S.
Ufimtsev I. S.
Wilkinson K. A.
Yasuda K.
Publication venue: Iowa State University Digital Repository
Publication date: 01/09/2012
Field of study

In this article, a new multithreaded Hartree–Fock CPU/GPU method is presented which utilizes automatically generated code and modern C++ techniques to achieve a significant improvement in memory usage and computer time. In particular, the newly implemented Rys Quadrature and Fock Matrix algorithms, implemented as a stand-alone C++ library, with C and Fortran bindings, provides up to 40% improvement over the traditional Fortran Rys Quadrature. The C++ GPU HF code provides approximately a factor of 17.5 improvement over the corresponding C++ CPU code

Digital Repository @ Iowa State University (ISU)

Crossref

Distributed Memory, GPU Accelerated Fock Construction for Hybrid, Gaussian Basis Density Functional Theory

Author: Asadchev Andrey
Clark David
de Jong Wibe A.
Popovici Doru Thom
Valeev Edward F.
Waldrop Johnathan
Williams-Young David B.
Windus Theresa
Publication venue: 'AIP Publishing'
Publication date: 24/03/2023
Field of study

With the growing reliance of modern supercomputers on accelerator-based architectures such a GPUs, the development and optimization of electronic structure methods to exploit these massively parallel resources has become a recent priority. While significant strides have been made in the development of GPU accelerated, distributed memory algorithms for many-body (e.g. coupled-cluster) and spectral single-body (e.g. planewave, real-space and finite-element density functional theory [DFT]), the vast majority of GPU-accelerated Gaussian atomic orbital methods have focused on shared memory systems with only a handful of examples pursuing massive parallelism on distributed memory GPU architectures. In the present work, we present a set of distributed memory algorithms for the evaluation of the Coulomb and exact-exchange matrices for hybrid Kohn-Sham DFT with Gaussian basis sets via direct density-fitted (DF-J-Engine) and seminumerical (sn-K) methods, respectively. The absolute performance and strong scalability of the developed methods are demonstrated on systems ranging from a few hundred to over one thousand atoms using up to 128 NVIDIA A100 GPUs on the Perlmutter supercomputer.Comment: 45 pages, 9 figure

arXiv.org e-Print Archive

Modernizing the core quantum chemistry algorithms

Author: Asadchev Andrey
Publication venue
Publication date: 01/01/2012
Field of study

This document covers the basics of computational chemistry and how using the modern programming techniques the theory can be efficiently implemented on digital computers. The computer implementations are developed from the core two-electron integrals to many-body and coupled cluster algorithms. A particular attention is paid to the physical constraints of he computer resources and the emergence of the novel architectures.</p

Digital Repository @ Iowa State University (ISU)

High-performance evaluation of high angular momentum 4-center Gaussian integrals on modern accelerated processors

Author: Asadchev Andrey
Valeev Edward F.
Publication venue
Publication date: 07/07/2023
Field of study

We present a high-performance evaluation method for 4-center 2-particle integrals over Gaussian atomic orbitals with high angular momenta (

l\geq4

) and arbitrary contraction degrees on graphical processing units (GPUs) and other accelerators. The implementation uses the matrix form of McMurchie-Davidson recurrences. Evaluation of the 4-center integrals over four

l=6

(

i

) Gaussian AOs in the double precision (FP64) on an NVIDIA V100 GPU outperforms the reference implementation of the Obara-Saika recurrences (

{\tt Libint}

) running on a single Intel Xeon core by more than a factor of 1000, healthily exceeding the 73:1 ratio of the respective hardware peak FLOP rates while reaching almost 50\% of the V100 peak. The approach can be extended to support AOs with even higher angular momenta; for low angular momenta alternative approaches will be needed to achieve optimal performance. The implementation is part of an open-source

{\tt LibintX}

library feely available at

{\tt github.com:ValeevGroup/LibintX}

arXiv.org e-Print Archive

New Multithreaded Hybrid CPU/GPU Approach to Hartree−Fock

Author: Asadchev Andrey
Gordon Mark
Gordon Mark
Publication venue
Publication date: 01/09/2012
Field of study

Digital Repository @ Iowa State University (ISU)

Fast and Flexible Coupled Cluster Implementation

Author: Asadchev Andrey
Gordon Mark
Gordon Mark
Publication venue
Publication date: 01/07/2013
Field of study

A new coupled cluster singles and doubles with triples correction, CCSD(T), algorithm is presented. The new algorithm is implemented in object oriented C++, has a low memory footprint, fast execution time, low I/O overhead, and a flexible storage backend with the ability to use either distributed memory or a file system for storage. The algorithm is demonstrated to work well on single workstations, a small cluster, and a high-end Cray computer. With the new implementation, a CCSD(T) calculation with several hundred basis functions and a few dozen occupied orbitals can run in under a day on a single workstation. The algorithm has also been implemented for graphical processing unit (GPU) architecture, giving a modest improvement. Benchmarks are provided for both CPU and GPU hardware.Reprinted (adapted) with permission from Journal of Chemical Theory and Computation 9 (2013): 3385, doi:10.1021/ct400054m. Copyright 2013 American Chemical Society.</p

Digital Repository @ Iowa State University (ISU)